IR-Specific Searches at TREC 2007: Genomics & Blog Experiments
نویسندگان
چکیده
This paper describes our participation in the TREC 2007 Genomics and Blog evaluation campaigns. Within these two tracks, our main intent is to go beyond simple document retrieval, using different search and filtering strategies to obtain more specific answers to user information needs. In the Genomics track, the dedicated IR system has to extract relevant text passages in support of precise user questions. This task may also be viewed as the first stage of a Question/Answering system. In the Blog track we explore various strategies for retrieving opinions from the blogsphere, which in this case involves subjective opinions about various targets entities (e.g., person, location, organization, event, product or technology). This task can be subdivided in two parts: 1) retrieve relevant information (facts) and 2) extract positive, negative or mixed opinions about the specific entity being targeted. To achieve these objectives we evaluate retrieval effectiveness using the Okapi (BM25) and various other models derived from the Divergence from Randomness (DFR) paradigm, as well as a language model (LM). Through our experiments with the Genomics corpus we find that the DFR models perform clearly better than the Okapi model (relative difference of 70%) in terms of mean average precision (MAP). Using the blog corpus, we found the opposite; the Okapi model performs slightly better than both DFR models (relative difference around 5%) and LM (relative difference 7%) model.
منابع مشابه
University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier
In TREC 2007, we participate in four tasks of the Blog and Enterprise tracks. We continue experiments using Terrier [14], our modular and scalable Information Retrieval (IR) platform, and the Divergence From Randomness (DFR) framework. In particular, for the Blog track opinion finding task, we propose a statistical term weighting approach to identify opinionated documents. An alternative approa...
متن کاملExperiments in TREC 2007 Blog Opinion Task at CAS-ICT
This paper describes our participation in TREC 2007 Blog Track Tasks: Opinion retrieval and Polarity classification. As for Opinion retrieval task, a two-step approach is used to retrieve opinion relevant blog unit (that is blog post and its comments) given a query after filtering Spam blog and extracting blog unit. With Polarity Classification, Drag-push [1] based classifier is employed to get...
متن کاملOpinion Retrieval Experiments Using Generative Models: Experiments for the TREC 2007 Blog Track
Ranking blog posts that express opinions regarding a given topic should serve a critical function in helping users. We explored a couple of methods for opinion retrieval in the framework of probabilistic language models. The first method combines topic-relevance model and opinion-relevance model, at document level, that captures topic dependence of the opinion expressions. The second method com...
متن کاملWhat's New at TREC: Blog and Legal Discovery Search at TREC-2006
This past year, the Text REtrieval Conference (TREC) started two new tracks. One was the Blog track – given a large collection of blog posts and their comments, the task was to locate opinions about products, people, organizations, etc. The other new track was the Legal Track. This track seeks to build test collections for searches that occur during the discovery portion of a lawsuit. The Legal...
متن کاملUniversity of Glasgow at TREC 2010: Experiments with Terrier in Blog and Web Tracks
In TREC 2010, we continue to build upon the Voting Model and experiment with our novel xQuAD framework within the auspices of the Terrier IR Platform. In particular, our focus is the development of novel applications for data-driven learning in the Blog and Web tracks, with experimentation spanning hundreds of features. In the Blog track, we propose novel feature sets for the ranking of blogs, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007